Introduction

Goal: To develop a model for predicting life expectancy in Baltimore down to single block resolution with estimates of uncertainty. You may need to develop an approach for “downscaling” since the outcome data you’ll be able to find is likely aggregated at the neighborhood level

Data

We have data from Baltimore city website, Baltimore Neighborhood Indicators Alliance BNIA-JF, and from the Maryland department of planning. The data consists of information about life expectancy estimates for each neighbourhood, along with crime, economic development and education informmation, all over a 5 year period (2010-2014). I also have street level, and thus block level data. In addition I have information which links streets to blocks and then to neighbourhood.

Descriptives

Since the goal of this analysis is to predict life expectancy at the street block level and since the block information conatined in my dataset does not Since some of the data files have information on neighbourhood blocks, I plotted the Neighbourhood information as defined or delineated by the block level data gotten from the Baltimore city website and then overlayed the neighbourhood data gotten from the Maryland department of planning. Futhermore, using information from the Baltimore gisdata website I was able to obtain what “block” was actually defined as. All of this points to the possiblity of using blocks from our dataset as street blocks.

For more plots examining the fits please visit my github repo

All of this indicate a good fit. I also used gis data from the baltimore city website and I found that each block was defined as a street block.An example of a cityblock pulled from dataset

Analysis

Checking for Spatial correlation

Since we have spatial data I ran the both Mantel test(c.f Mantel 1966) and Moran’s I (c.f Moran 1950) to examine if spatial autocorrelation exists in this dataset. Please note that while both test measure spatial autocorrelation, they refer to quite different concepts.

Mantel’s test(Mantel 1966; Dutilleul et al. 2000) gives correlation between different variables due to their spatial location, that is Mantel’s test judges whether closeness in one set of variables is related to closeness in another set of variable. Relating this to our datasetwe can use it to see if samples that are close in terms of their geographic location values are also close in terms of life expectancy values. I.e test if the distance matrix based on life expectancy values is correlated with the distance matrix based on spatial location for the CSA’s

## Monte-Carlo test
## Call: ade4::mantel.randtest(m1 = csa.dists, m2 = le11.dists, nrepet = 9999)
## 
## Observation: 0.1271281 
## 
## Based on 9999 replicates
## Simulated p-value: 0.0408 
## Alternative hypothesis: greater 
## 
##      Std.Obs  Expectation     Variance 
## 1.8244763700 0.0003022149 0.0048321349

Based on these results, we can reject the null hypothesis that these two matrices, spatial distance and life expectancy distance (2011), are unrelated with alpha = 0.05. The observed correlation, r = 0.1271281, suggests that the matrix entries are positively associated. So smaller differences in life expectancy are generally seen among pairs of CSA’s that are close to each other than far from each other. Note that since this test is based on random permutations, the same code will always arrive at the same observed correlation but rarely the same p-value. Furthemore, I ran this test for all four years in the datset set and the conclusions are consistent. If you are interested in the correlation values for those years here is the code.

Moran’s I(Moran 1950) is useful when one wants to know the correlation of a variable with itself through space. I.e., when one wants to know to which extent the occurrence of an event in an areal unit makes it more likely or unlikely the occurrence of an event in a neighboring areal unit. I.e if life expectancy is low in the north does that mean that we likely to see low life expectancy in the same region? Thus the null is the lack of existence of spatial autocorrelation.

## $observed
## [1] 0.08941123
## 
## $expected
## [1] -0.01851852
## 
## $sd
## [1] 0.01710081
## 
## $p.value
## [1] 2.765532e-10

Based on these results, we can reject the null hypothesis that there is zero spatial autocorrelation present in life expectancy at the 5% level of significance. For more tests using data from 2011 to 2014 please check here.

Regression Models for spatial data

Geographically Weighted Regression (GWR)

  • The structure of the model does not remain constant over the study area (there are local variations in the parameter estimates)
  • To account for this potential spatial heterogeneity we use the GWR model (Fotheringham, Brunsdon, and Charlton 2002)
  • GWR permits the parameter estimates to vary locally.
GWR

This model uses a weighted least squares approach to account for spatial heteorgeniety and is as follows \[Y_i = X\beta_i + \epsilon_i \] where i is the location and \(\beta_i\) is solved using the WLS approach. Thus \[ \beta_i = (X^TW_iX)^{-1}X^TW_iY \] where \(W_i\) is the weight matrix

Methods for Downscaling

  • Delta method: Here, after we find the model that fits the date best, using aggregated data. We then predict what the outcome would be after we remove one of the blocks from the aggregated data, call this \[ T_{-b} = E(Y)_{-b} \] and then find the predicted life expectancy for the removed block as \[ T_{b} = T_{full} – T_{-b} \]

  • Transfer function: Find which aggregated predictors provide the best fit, then use a “transfer function” to map the aggregated variables to the block level and use the value gotten as a predictor to get block level estimates.

Datasets

Name Information Type Data Source Geographic Scale Date
Real Property Taxes Contains information on which streets belong to which block and in what neighbourhood along with their longitude and latitude. Also has information on police district. Dataset Baltimore city website Street Level 2016
Real Property Contains the City of Baltimore parcel boundaries, with ownership, address, valuation and other property information. Furthermore, it also contains street block definitions. Dataset Baltimore gisdata website Street level 2016
Census Block GIS shapefile which has information on census block designation for 2010 Shapefile Maryland department of planning Block level 2010
Neighborhoood Polygon feature representing the boundaries of Baltimore City’s neighborhoods as of the year 2010 Shapefile Baltimore city website Neighborhood level 2010
Census Demographics for 2010 to 2014 Contains neighborhood level demographics data Dataset Baltimore Neighborhood Indicators Alliance BNIA-JF Neighborhood level 2010 - 2014
Children and Family Health & Well-Being Has information on life expectancy for 2010 to 2014 Dataset Baltimore Neighborhood Indicators Alliance BNIA-JF Neighborhood level 2010 - 2014
BNIA Vital Signs Codebook Contain information on short variable names and their corresponding full names, along with their sources for each dataset Dataset Baltimore city website Neighborhood level 2016
Housing and Community Development Has information on the state of households in Baltimore city, viz;Number of Homes Sold,Percentage of Residential Properties that are Vacant and Abandoned,Percent Residential Properties that do Not Receive Mail, etc. Dataset Baltimore Neighborhood Indicators Alliance BNIA-JF Neighborhood level 2010-2014
BNIA Data linking CSA to Neighborhoods Has information on which neighborhoods belong to what CSA. Note that a neighborhood may belong to more than one CSA Dataset Baltimore Neighborhood Indicators Alliance BNIA-JF CSA and Neighborhood level 2010
devtools::session_info()
## Session info --------------------------------------------------------------
##  setting  value                       
##  version  R version 3.3.1 (2016-06-21)
##  system   x86_64, mingw32             
##  ui       RTerm                       
##  language (EN)                        
##  collate  English_United States.1252  
##  tz       America/New_York            
##  date     2016-10-03
## Packages ------------------------------------------------------------------
##  package       * version date       source        
##  colorspace      1.2-6   2015-03-11 CRAN (R 3.3.1)
##  devtools      * 1.12.0  2016-06-24 CRAN (R 3.3.1)
##  digest          0.6.9   2016-01-08 CRAN (R 3.3.1)
##  downloader    * 0.4     2015-07-09 CRAN (R 3.3.1)
##  evaluate        0.9     2016-04-29 CRAN (R 3.3.1)
##  foreign         0.8-66  2015-08-19 CRAN (R 3.3.0)
##  formatR         1.4     2016-05-09 CRAN (R 3.3.1)
##  geosphere       1.5-5   2016-06-15 CRAN (R 3.3.1)
##  ggmap         * 2.6.1   2016-01-23 CRAN (R 3.3.1)
##  ggplot2       * 2.1.0   2016-03-01 CRAN (R 3.3.1)
##  gtable          0.2.0   2016-02-26 CRAN (R 3.3.1)
##  htmltools       0.3.5   2016-03-21 CRAN (R 3.3.1)
##  jpeg            0.1-8   2014-01-23 CRAN (R 3.3.0)
##  knitr           1.13    2016-05-09 CRAN (R 3.3.1)
##  lattice         0.20-33 2015-07-14 CRAN (R 3.3.1)
##  lubridate     * 1.5.6   2016-04-06 CRAN (R 3.3.1)
##  magrittr        1.5     2014-11-22 CRAN (R 3.3.1)
##  mapproj         1.2-4   2015-08-03 CRAN (R 3.3.1)
##  maps            3.1.0   2016-02-13 CRAN (R 3.3.1)
##  maptools      * 0.8-39  2016-01-30 CRAN (R 3.3.1)
##  memoise         1.0.0   2016-01-29 CRAN (R 3.3.1)
##  munsell         0.4.3   2016-02-13 CRAN (R 3.3.1)
##  plyr            1.8.4   2016-06-08 CRAN (R 3.3.1)
##  png             0.1-7   2013-12-03 CRAN (R 3.3.0)
##  proto           0.3-10  2012-12-22 CRAN (R 3.3.0)
##  RColorBrewer  * 1.1-2   2014-12-07 CRAN (R 3.3.0)
##  Rcpp            0.12.5  2016-05-14 CRAN (R 3.3.1)
##  readr         * 0.2.2   2015-10-22 CRAN (R 3.3.1)
##  readxl        * 0.1.1   2016-03-28 CRAN (R 3.3.1)
##  reshape2        1.4.1   2014-12-06 CRAN (R 3.3.1)
##  RevoUtils       10.0.1  2016-08-24 local         
##  RevoUtilsMath * 8.0.3   2016-04-13 local         
##  RgoogleMaps     1.2.0.7 2015-01-21 CRAN (R 3.3.1)
##  rjson           0.2.15  2014-11-03 CRAN (R 3.3.0)
##  RJSONIO         1.3-0   2014-07-28 CRAN (R 3.3.0)
##  rmarkdown       0.9.6   2016-05-01 CRAN (R 3.3.1)
##  scales          0.4.0   2016-02-26 CRAN (R 3.3.1)
##  sp            * 1.2-3   2016-04-14 CRAN (R 3.3.1)
##  stringi         1.1.1   2016-05-27 CRAN (R 3.3.0)
##  stringr         1.0.0   2015-04-30 CRAN (R 3.3.1)
##  withr           1.0.2   2016-06-20 CRAN (R 3.3.1)
##  yaml            2.1.13  2014-06-12 CRAN (R 3.3.1)

References

Dutilleul, Pierre, Jason Stockwell, Dominic Frigon, and Pierre Legendre. 2000. “The Mantel Test Versus Pearson’s Correlation Analysis: Assessment of the Differences for Biological and Environmental Studies.” Journal of Agricultural, Biological, and Environmental Statistics 5 (June). International Biometric Society: 131–50. http://www.jstor.org/stable/1400528.

Fotheringham, A. Stewart, Chris Brunsdon, and Martin Charlton. 2002. Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. Wiley.

Mantel, Nathan. 1966. “The Detection of Disease Clustering and a Generalized Regression Approach.” American Association for Cancer Research., September.

Moran, Patrick Alfred Pierce. 1950. “Notes on Continuous Stochastic Phenomena.” Biometrika 37 (June). Oxford University Press on behalf of Biometrika Trust: 17–23. http://www.jstor.org/stable/2332142.